Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename fuzziness/min_similarity to edit_distance #4082

Closed
clintongormley opened this issue Nov 4, 2013 · 1 comment · Fixed by #4587
Closed

Rename fuzziness/min_similarity to edit_distance #4082

clintongormley opened this issue Nov 4, 2013 · 1 comment · Fixed by #4587

Comments

@clintongormley
Copy link

Currently we have:

  • flt | flt_field have min_similarity
  • fuzzy has min_similarity
  • query_string has fuzzy_min_sim
  • match has fuzziness
  • the completion suggester uses { fuzzy: { edit_distance: 2}}

Fuzziness in Elasticsearch refers to edit-distance, which can be set to 0,1 or 2.

min_similarity accepts a float value between 0 and 1, but now gets converted to an edit distance based on word length. eg a word of two characters with an edit distance of 2 would match any other word of length 2.

I would suggest renaming fuzziness and min_similarity to edit_distance everywhere. It should accept 0,1,2 and auto, which sets the edit_distance to 1 for words of 1..3 characters, and 2 for words of 4 characters or more.

The only fly in the ointment is the fuzzy query which also handles fuzzy numbers and dates, which have nothing to do with edit distance. See proposed deprecation in #4076

@s1monw
Copy link
Contributor

s1monw commented Jan 2, 2014

I opened a pull request for this that is split into 2 commits. One commit adds the generalization in terms of naming but handles all the old naming gracefully. The other commit breaks BW compat and updates docs etc. I want to pull one of the commits into 0.90 for easier transition. I also tried to keep most of the defaults in this issue to not do N things at once. @clintongormley can you give it a review?

@s1monw s1monw closed this as completed in bc5a9ca Jan 9, 2014
brusic pushed a commit to brusic/elasticsearch that referenced this issue Jan 19, 2014
A lot of different API's currently use different names for the
same logical parameter. Since lucene moved away from the notion
of a `similarity` and now uses an `fuzziness` we should generalize
this and encapsulate the generation, parsing and creation of these
settings across all queries.

This commit adds a new `Fuzziness` class that handles the renaming
and generalization in a backwards compatible manner.

This commit also added a ParseField class to better support deprecated
Query DSL parameters

The ParseField class allows specifying parameger that have been deprecated.
Those parameters can be more easily tracked and removed in future version.
This also allows to run queries in `strict` mode per index to throw
exceptions if a query is executed with deprected keys.

Closes elastic#4082
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants